Skip to content

Conversation

@Krashnicov
Copy link

Fix: Add adaptive thresholding for small documents in document query

Follows up on PR agent0ai#788 (Subordinate agents settings override).

When subordinate agents have a settings.json in their profile directory,
the profile was getting reset to 'agent0' instead of preserving the
intended profile.

Root cause: initialize_agent() creates a new config with default profile,
and while memory_subdir was preserved, the profile was not.

Fix: Apply the same preservation pattern used for memory_subdir to profile.
- Migrate deprecated [real_ip] section to [botdetection.proxy]
- Add trusted_proxies configuration under [botdetection.proxy]
- Remove 'Ahmia blacklist' from enabled_plugins in settings.yml

This aligns with SearXNG latest configuration requirements
and removes plugins that cause issues.
Small documents (< 10 chunks) were failing to return content due to:
- High similarity thresholds (0.5 default) filtering out all chunks
- Query optimization making small docs drift from original intent
- No semantic matches found, resulting in empty results

Changes:
- Pre-index documents and count chunks before querying
- Calculate adaptive threshold based on document sizes:
  * < 5 chunks: threshold 0.0 (accept all)
  * 5-10 chunks: threshold 0.3 (lenient)
  * > 10 chunks: DEFAULT_SEARCH_THRESHOLD (standard)
- Skip query optimization for tiny docs to preserve specificity
- Fallback: If no semantic matches found, include all chunks for small docs

Impact:
- Zero false negatives for tiny docs
- Reduced latency (skip query optimization)
- Better precision for small docs with specific queries
- Graceful fallback when semantic search fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant